A Decision Boundary based Discretization Technique using Resampling

نویسندگان

  • Taimur Qureshi
  • Djamel A Zighed
چکیده

Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce a technique by using resampling (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether the resampling technique can lead to better discretization points, which opens up a new paradigm to construction of soft decision trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stratified and Un-stratified Sampling in Data Mining: Bagging

Stratified sampling is often used in opinion polls to reduce standard errors, and it is known as variance reduction technique in sampling theory. The most common approach of resampling method is based on bootstrapping the dataset with replacement. A main purpose of this work is to investigate extensions of the resampling methods in classification problems, specifically we use decision trees, fr...

متن کامل

Effects of Slip Condition on the Characteristic of Flow in Ice Melting Process

In this paper a laminar flow of water on an ice layer subjected to a slip condition is considered numerically. The paper describes a parametric mathematical model to simulate the coupled heat and mass transfer events occurring in moving boundary problems associated with a quasi steady state steady flow process. The discretization technique of the elliptic governing differential equations of mas...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

Three Dimensional Modeling of the Dc Potential Drop Method Using Finite Element Andboundaryelement Analysis Introduction

The finite element method (FEM) is a domain technique of solving the underlying governing equation in the region. The solution obtained in the total region is ideal to study energy/defect interactions, but the extensive discretization demands vast computer resources. On the other hand, the potential savings in computation resources, due to a limited surface or boundary discretization, is the pr...

متن کامل

Non-linear Thermo-mechanical Bending Behavior of Thin and Moderately Thick Functionally Graded Sector Plates Using Dynamic Relaxation Method

In this study, nonlinear bending of solid and annular functionally graded (FG) sector plates subjected to transverse mechanical loading and thermal gradient along the thickness direction is investigated. Material properties are varied continuously along the plate thickness according to power-law distribution of the volume fraction of the constituents. According to von-Karman relation for large ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009